Efficient formalism for large scale ab initio molecular dynamics 
based on time-dependent density functional theory 
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A new "on the fly" method to perform Born-Oppenheimer ab initio molecular dynamics (AIMD) 
is presented. Inspired by Ehrenfest dynamics in time-dependent density functional theory, the 
electronic orbitals are evolved by a Schrodinger-like equation, where the orbital time derivative is 
multiplied by a parameter. This parameter controls the time scale of the fictitious electronic motion 
and speeds up the calculations with respect to standard Ehrenfest dynamics. In contrast to other 
methods, wave function orthogonality needs not be imposed as it is automatically preserved, which 
is of paramount relevance for large scale AIMD simulations. 
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Ab initio molecular dynamics (AIMD) on the ground 
state Born-Oppcnhcimcr (gsBOMD) potential energy 
surface for the nuclei has become a standard tool for sim- 
ulating the conformational behaviour of molecules, bio- 
and nano-structures and condensed matter systems from 
first principles Q. However, gsBOMD (in the DFT 
picture) requires that the Kohn-Sham (KS) energy func- 
tional be minimized for each value of the nuclei posi- 
tions. As this minimization can be very demanding, Car 
and Parrinello (CP) [3] proposed an elegant and efficient 
"on the fly" scheme in which the KS orbitals are propa- 
gated with a fictitious dynamics that mimics gsBOMD. 
The CP method has had a tremendous impact in many 
scientific areas 0, Nevertheless, the numerical cost 
of AIMD hinders the application of the method to large 
scale simulations, such as those of interest in biochem- 
istry or material science. Recently, new methods that 
allow larger systems and longer simulation times to be 
studied have been reported Q, but the cost associated 
with the wave function orthogonalization is still a poten- 
tial bottleneck for both gsBOMD and CP. 

Time-dependent density functional theory (TDDFT) 
0, HI has been for a long time recognized as an 
orthogonalization-free alternative for both ground state 
Q and excited state AIMD. In its simplest implementa- 
tion, Ehrenfest TDDFT, the ions are treated classically 
following electronic Hellmann-Feynman forces. For sys- 
tems where the gap between the ground and the first 
excited state is large, Ehrenfest tends to gsBOMD and 
can mimic adiabatic dynamics [l[. However, the rapid 
movement of the electrons in TDDFT requires the use of 
a very small time step, which, in many occasions, renders 
its numerical application non-practical [Io| . 

In this letter, we borrow some of the ideas of CP and 
introduce a new TDDFT Ehrenfest dynamics that re- 



duces the cost of AIMD simulations while keeping the 
accuracy of the results in tolerable levels, similar to CP. 
The whole scheme can be obtained from the following La- 
grangian (atomic units are used throughout this paper): 

N , 

C = 1 f E / (to - to) dr + Kj- E[4>, R] , (1) 

where Ki = \Yli Mf-Rr ■ Ri is the kinetic energy of 
the nuclei, Mj their masses and E the KS energy. Note 
that the major modification with respect to TDDFT is 
the scaling of the electronic velocities by a parameter fi 
(TDDFT is recovered when /j, = 1). We show in what fol- 
lows that, in the — > limit, the trajectories of the sys- 
tem approach gsBOMD, and practical calculations can be 
done for values of y, 3> 1, thus allowing for more efficient 
implementations than TDDFT while retaining its advan- 
tageous properties: the conservation of the total energy 
and of the orthogonality of the orbitals. Also, from the 
computational point of view, the new scheme is simple 
and can be easily incorporated into existing codes. 

The equations of motion obtained from fTJ) for the elec- 
tronic (<f> j) and nuclear (Ri) degrees of freedom are: 
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MjRi = -ViE[cb,R} , 



(2b) 



where v e g is the time-dependent KS effective potential. 

In contrast to CP, the new dynamics conserves the 
physical energy i? p h ys : = Ki + E[(j>, R] as well as the 
scalar product among the orbitals <pj. The first is a di- 
rect consequence of C being linear in the velocities <f>j 
and (j)*, and not depending explicitly on t. The conserva- 
tion of the scalar product requires more attention due to 
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the nonlinear character of the term 6E/6<p*j. To prove it, 
note that E[<j), R] is invariant under any unitary transfor- 
mation mixing the orbitals 4> — * U(f>, with U = e^'^W^, 
being A an N x TV Hermitian matrix. From this invari- 
ance and eq. @, we have 



Ajk-rr&k ~ Ajk4>j 



dr 



,R 



= 



(3) 
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Now, since J Ajk4>*4>k dr is constant for all A — A\ the 
scalar product of any pair <f>j, <fik is a constant as well. 
Hence, if we start from an orthonormal set, we will not 
have to reorthonormalize the orbitals during the MD sim- 
ulation. Numerically, this means that the formal scal- 
ing of the new scheme is quadratic with the number of 
atoms, while for CP and gsBOMD it is cubic [2(| due 
to the orthogonalization. In addition, the time propaga- 
tion is naturally parallelizablc by distributing the orbitals 
among different processors, as the evolution of each or- 
bital is almost independent from the others'. 

An important question is whether the new method re- 
produces gsBOMD. We show that the fi — » limit ac- 
counts for this solution. To do so, we recall that the BO 
Lagrangian reads 



C BO = K I -E[(t>,R] 



jk 



■ dr — S 



(4) 



where A^° are the Lagrange multipliers which ensure 
the orthonormality of the orbitals. Clearly, as the or- 
thonormality is automatically satisfied by the propaga- 
tor in our approach, the limit [i — > gives the BO La- 
grangian without the A BO term. Note, however, that 
one could have started from a different Lagrangian £ = 
£ + Ejfe ( / 4>j4>kdr — djk) for which the \i — > limit 
is £bOi and then, using (for p / 0) the gauge symme- 
try of £' O' = c lA (f> and A' = e iA Ae"^ - ine lA ±e~ lA ), 
where A is a time-dependent Hermitian matrix, one can 
send A' to zero recovering the dynamics of C. This limit 
has a simple physical interpretation. The effect of /i is 
to scale the TDDFT excitation energies by a factor. 
So for [i > 1 the gap of the artificial system is decreased, 
increasing the non-adiabatic coupling, while for small val- 
ues fi the excited states are pushed up in energy forcing 
the system to stay in the adiabatic regime 21 1 . 

Next, to provide an estimation of the performance im- 
provements of our method in comparison with Ehrcn- 
fest dynamics, we write the left hand side of (|2a|) as 
fi(dcf)/dt) — d<f)/dt e . With this transformation, (|2a|) can 
be seen as a standard TDDFT propagation, and we find 
that the maximum time step for our method in terms 



of /x is At — (J.At e , where At e is the maximum elec- 
tronic time step, determined by the system and the prop- 
agation scheme. In the case of CP, on the other hand, 
At cx y/Hcp- Note, however, that this difference does 
not imply anything about the relative performance of 
both methods, since the two parameters are not directly 
comparable (e.g., they have different dimensions). The 
dependence of the accuracy on /x must also be taken into 
account, as we show later. Additionally, the ionic motion 
imposes a constraint in the maximum value of At, but 
usually this limit is much higher. 

Now, although our method approaches the reference 
gsBOMD as /x — > 0, this limit is not practical from a 
numerical point of view because it implies a time step 
At — > 0. But, as /x = 1 is already close to gsBOMD for 
large gap systems, we shall mainly focus on how close we 
can stay to this limit for ^> 1. In this regime, numerical 
simulations are in principle \i times faster than standard 
TDDFT, so we first made a detailed study of how large 
can fj, be in CP and Ehrenfest for a simple 2-band model 
(see suplementary material 19]), to conclude that the 
new scheme shows a performance similar to CP. 

To further investigate in real systems the efficiency of 
this new approach, we implemented it, together with CP, 
in the first principle Octopus code ll| . For Ehrenfest dy- 
namics, the Approximated Enforced Time Reversal Sym- 
metry method [IH is used to propagate the electronic 
wave functions. In the case of CP the electronic part 
is integrated by a RATTLE/ Velocity Verlet algorithm 
described in Ref. O- In both cases, velocity Verlet algo- 
rithm is used for the ionic equations of motion. The ions 
are represented using norm-conserving pseudo-potentials 
and the exchange correlation term is approximated by 
the Adiabatic LDA functional. 

With respect to implementation there are some differ- 
ences to remark. For Ehrenfest, the propagation must 
be performed using complex wave functions while for CP 
it can be performed using real wave functions for finite 
systems or for gamma point super-cell calculations in pe- 
riodic systems. Also, due to the second order dynamic of 
CP, two sets of wave functions must be propagated while 
only one is needed for Ehrenfest. Finally, in the velocity 
Verlet algorithm, a temporary third set of wave functions 
is required to store the previous time step. 

In parallel architectures, CP methods are known to 
scale very well based on domain descomposition 
This also applies to Ehrenfest dynamics and, on top of 
that, we can add a new level of parallelization by dis- 
tributing groups of different states among processors. As 
the evolution of each state is independent, this is a very 
effective approach where communication is only required 
to calculate quantities that involve sums over all states, 
like the density or the forces. As these operations are 
performed only once per time step, it can scale efficiently 
even over slow interconnections. In the case of CP, due 
to orthogonalization between states, this parallelization 
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At = 1 398 961 1209 1623 3058 
IX = 5 396 958 1204 1620 3040 
IX = 10 391 928 1185 1611 2969 
H= 15 381 938 1181 1597 2862 

TABLE I: Selected vibrational frequencies (in cm' 1 ) for the 
benzene molecule, obtained using different values of fx. 
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scheme is more complex to implement 
much more communication. 

The first real system we simulated was the Nitrogen 
molecule (see supplementary material 0). We observed 
that, for /j, = 20, the simulation remains steadily close to 
the BO potential energy surface, and there is only a 3.4% 
deviation of the vibrational frequency. For fx — 30 the 
system starts to strongly separate from the gsBO surface 
by mixing with higher BO surfaces. 

Next, we applied the method to the benzene molecule. 
We set-up the atoms in the equilibrium geometry with 
a random Maxwell-Boltzmann distribution for 300°K. 
Each run was propagated for a period of time of ~ 400 
[fs] with a time step of \x x 0.001 [fs] (that provides a 
reasonable convergence in the spectra). Vibrational fre- 
quencies were obtained from the Fourier transform of the 
velocity auto-correlation function. In table U we show 
some low, medium and high frequencies of benzene as a 
function of fx. The general trend is a red-shift of the fre- 
quencies with a maximum deviation of 7% for fx = 15. 
Still, to make a direct comparison with experiment, we 
computed the infrared spectra as the Fourier transform 
of the electronic dipole operator. In Fig. [TJ we show how 
the spectra changes with /i. For large zx, besides the red- 
shift, spurious peaks appear above the higher vibrational 
frequency (not shown). We performed equivalent CP cal- 
culations for different values of /xcp, and found that, as 
shown in Fig.[T] it is possible to compare the physical er- 
ror induced in both methods and establish and a relation 
between /i and /xcp- 

Having established this link, we address the numerical 
performance of our new method compared to CP in terms 
of system size. To do this, we simulate several benzene 
molecules in a cell. For the new scheme, a value of \x = 15 
is used while for CP ficp = 750, (values that yield a sim- 
ilar deviation from the BO surface, according to Fig. [TJ. 
The time steps used are 3.15 [a.u] and 7.26 [a.u.] re- 
spectively. The computational cost is measured as the 
simulation time required to propagate one atomic unit of 
time. We performed the comparison both for serial and 
parallel calculations; the results are shown in Fig. O In 
the serial case, CP is 3.5 times faster for small systems, 
but the difference reduces to only 1.7 times faster for the 
larger ones. Extrapolating the results, we predict that 
the new dynamics will become less demanding than CP 
for around 1100 atoms. In the parallel case, the differ- 
ence is reduced, CP being only 2 times faster than our 
method for small systems, and with a crossing point be- 
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FIG. 1: Calculated infrared spectrum for benzene for different 
values of li, compared to CP dynamics and to experiment [lq ]. 



low 750 atoms. This is due to the better scalability of 
the Ehrenfest approach, as seen on Fig. Moreover, 
memory requirements for our approach are lower than 
for CP: in the case of 480 atoms the ground state cal- 
culation requires a maximum of 3.5 GB, whereas in the 
MD, Ehrenfest requires 5.6 GB and CP 10.5 GB. 

To close the computational assessment of the new for- 
malism, we illustrate our method (using fx = 5) for 
the calculation of the infrared spectrum of a prototype 
molecule, Ceo ■ The calculated IR spectra is in very good 
agreement with the experiment (see Fig. [3J for low and 
high energy peaks (which are the most sensitive to fx as 
seen in Fig. [I}. The result is robust and independent of 
the initial condition of the simulation. The low energy 
splitting of IR spectrum starts to be resolved for simula- 
tions longer than 2 [ps] . 

In conclusion, we have presented a new approach to 
AIMD based on a generalization of TDDFT Ehrenfest 
dynamics. Our approach introduces a parameter fx that 
controls the trade-off between the closeness of the sim- 
ulation to the gsBO surface and the numerical cost of 
the calculation, analogously to the role of the fictitious 
electronic mass in CP. We have made direct comparisons 
of the numerical performance with CP, and, while quan- 
titatively our results are system- and implementation- 
dependent, they prove that our method can outperform 
CP in some relevant cases, namely, for large scale systems 
that are of interest in several research areas and that can 
only be studied from first principles MD in massively 
parallel computers. To increase its applicability, it would 
also be important to study if the improvements developed 
to optimize CP can be combined with our approach [f|, 
in particular, techniques to treat small-gap or metallic 
systems [l8j . 

Finally, note that the introduction of the parameter 
ix comes at a cost, as we change the time scale of the 
movements of the electrons with respect to the Ehrenfest 
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FIG. 2: Computational performance comparisons of our 
method and CP for an array of benzene molecules with finite 
boundary conditions and a spacing of 0.6 [a.u.]. Performance 
is measured as the computational time required to propagate 
one atomic unit of time, a) Single processor computational 
cost for different system sizes, (inset) Polynomial extrapo- 
lation for larger systems. Performed in one core of an Intel 
Xeon E5435 processor, b) Parallel computational cost for dif- 
ferent system sizes. Performed in 32 x Intel Itanium 2 (1.66 
GHz) processor cores of a SGI Altix. c) Parallel scaling with 
respect to the number of processor for a system of 480 atoms 
in a SGI Altix system. In both cases a mixed states-domain 
parallelization is used to maximize the performance. 



case, which implies a shift in the electronic excitation 
energies. This must be taken into account when we ex- 
tend the applicability of our method for non-adiabatic 
MD and MD under electromagnetic fields, in particular 
for the case of Raman spectroscopy, general resonant vi- 
brational spectroscopy, and laser induced molecular bond 
rearrangement (work in progress). 
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FIG. 3: Infrared spectrum of Ceo- The (blue) dashed line cor- 
responds to the calculated one (/^ = 5 and 2 [ps] of time) while 
the black bars are the experimental values from Ref. [17j |. 



ees for suggesting interesting ideas to extend the appli- 
cability of the method. 
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